A Statistical Approach Designed for Finding Mathematically Defined Repeats in Shotgun Data and Determining the Length Distribution of Clone-Inserts

نویسندگان

  • Lan Zhong
  • Kunlin Zhang
  • Xiangang Huang
  • Peixiang Ni
  • Yujun Han
  • Kai Wang
  • Jun Wang
  • Songgang Li
چکیده

The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats and mask them prior to assembly even at the stage of genome survey. It is known that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affect the accuracy of repeat assembly and scaffold construction, we also designed length distribution of clone-inserts using our model. In our simulated genomes of human and rice, the length distribution of repeats is different, so their optimal length distributions of clone-inserts were not the same. Thus with optimal length distribution of clone-inserts, a given genome could be assembled better at lower coverage.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

P-119: Survey of Genetic Alterations in Exon1 of Androgen Receptor Gene in Azoospermic Patients

Background Androgen receptor (AR) mediates androgen actions such as initiation and promotion of spermatogenesis and growth of accessory sex organs. There are two trinucleotide polymorphisms (CAG and GGN repeats) in exon1 of AR gene that are vary in length in population. The CAG and GGN repeats association with infertility is still unknown and this study is planned to assess the distribution of ...

متن کامل

Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs

An increasingly important problem in genome sequencing is the failure of the commonly used shotgun assembly programs to correctly assemble repetitive sequences. The assembly of non-repetitive regions or regions containing repeats considerably shorter than the average read length is in practice easy to solve, while longer repeats have been a difficult problem. We here present a statistical metho...

متن کامل

A Mixed Integer Programming Approach to Optimal Feeder Routing for Tree-Based Distribution System: A Case Study

A genetic algorithm is proposed to optimize a tree-structured power distribution network considering optimal cable sizing. For minimizing the total cost of the network, a mixed-integer programming model is presented determining the optimal sizes of cables with minimized location-allocation cost. For designing the distribution lines in a power network, the primary factors must be considered as m...

متن کامل

Determining and Optimizing Effective Factors in Laser Irradiation on Skin Tensional Strength using a Hybrid DOE and DEA Approach

Introduction: We investigated the characteristic of a suitable irradiation on skin's tensional strength using design of experiments (DOE). The experiments in this research are designed in two phases and data envelopment analysis (DEA) is used for performance measurement of each phase. Material and Methods: Samples were provided from pleura as surface tissue made of collagen and elastin fibers. ...

متن کامل

ارزیابی عملکرد نوارهای پیچیده شده درافزایش انتقال حرارت چگالشی و تأثیر آن برمیزان افت فشار

In this research, heat transfer enhancement and simultaneous effect of that on pressure drop inside condensers with twisted tape inserts are investigated. A refrigeration system is designed for attaining to maximum level of heat transfer with minimum pressure drop. The test condenser is a double pipe heat transformer with inner and outer diameter of 10.7mm and 12.7mm for internal pipe, respecti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2003